perm filename XPUB.DOC[2,TES] blob sn#036551 filedate 1973-04-18 generic text, type T, neo UTF8
                                                       April 18, 1973


                  T A B L E   O F   C O N T E N T S
                  _ _ _ _ _   _ _   _ _ _ _ _ _ _ _




                               SECTION                           PAGE




I      BLOCK DIAGRAM


II     THE MANUSCRIPT

            II-A      TEXT EXPRESSIONS .  .  .  .  .  .  .  .  .  . 3

            II-B      GLYPHS  .  .  .  .  .  .  .  .  .  .  .  .  . 4

            II-C      THE DEVICE SPECIFICATION  .  .  .  .  .  .  . 4

            II-D      THE FORMATTER .  .  .  .  .  .  .  .  .  .  . 5


III    GALLEYS

            III-A     THE PAGINATOR .  .  .  .  .  .  .  .  .  .  . 7

            III-B     THE POLISHER AND THE DOCUMENT   .  .  .  .  . 8

            III-C     THE PRINTER/VIEWER  .  .  .  .  .  .  .  .  . 8


IV     THE REGISTRY

            IV-A      GLYPH FILES   .  .  .  .  .  .  .  .  .  .  . 9


V      THE MLISP EXTENSION

            V-A       ADVANTAGES AND DISADVANTAGES OF MLISP .  .   11


VI     STANDARDS


VII    REALIZATION
April 18, 1973                                      TABLE OF CONTENTS


VIII   PROPOSAL FOR GRAPHICS LANGUAGE


IX     FIGURE 1


X      FIGURE 2










































                                  i
                                                       April 18, 1973







               A PROPOSAL FOR THE NEW DOCUMENT SYSTEM

             Larry Tesler, Brian Harvey, Lester Earnest,
                   Tovar Mock, and Robert Sproull





The new system has two main purposes:

(1) To provide a means for flexible production of medium-to-high
quality documents such as technical reports, manuals, theses, and
books which may include text, line drawings, half-tone images, and
mathematical symbolism.

(2) To provide a standard representation for such documents that can
be printed or displayed on various kinds of output devices by various
kinds of computers with reasonable results.

The proposed participants in development of the new system are
Stanford University, Carnegie-Mellon University, and Xerox Palo Alto
Research Center.

This proposal was prepared by the Palo Alto Committee, consisting of
Stanford and Xerox people.  It deals with the overall organization of
the system and with details pertaining to front-end processing.  The
Pittsburgh committee at CMU has prepared a proposal dealing with the
tail-end of the system.  The two proposals shall be exchanged as well
as submitted to other interested parties for comment, criticism, and
reconciliation.
April 18, 1973


                              SECTION I
                              _______ _

                            BLOCK DIAGRAM
                            _____ _______




A possible block diagram of the proposed system is shown in Figure 1.
Dash-boxes represent computer files; plus-boxes represent visible
copy; starred boxes represent programs.

The system starts with a "scribble" in an author's head or on paper.
Using a conventional TEXT EDITOR, the author prepares a "manuscript"
file encoded in a PUB-like language.  The manuscript is fed to the
FORMATTER program which produces a "galley proof".  The galley may be
printed (or displayed) by a PRINTER/VIEWER program to be proofread by
the author for errors.  To correct errors, changes are made to the
manuscript and the FORMATTER is run again.

Once an acceptable galley proof is obtained, it is fed to the
PAGINATOR and POLISHER programs which produce a "document" file.
This file may be printed (or displayed) by the PRINTER/ VIEWER
program.  Again, if errors are discovered, corrections must be made
in the manuscript and the cycle repeated.

Auxiliary programs and files that appear in the block diagram will be
explained in subsequent sections.

This system has an inherent flaw.  It is not possible to create non-
rectangular columns of text, because the FORMATTER does not know
where page boundaries will fall.  Non-rectangular columns are useful
for displaying insets.

To remedy this flaw, an alternate sytem organization is under
consideration, that of PUB.  Formatting and pagination are performed
in a single pass with backtracking.  For example, at the beginning of
a "group" (all to appear on one page), a decision point
(choice/failset point) is set with two choices: continue on this page
or go to next page.  If the page runs out before the group is
finished, a failure resets the state to that at the beginning of the
group and the alternate choice is made.

In PUB, backtracking was slow and overly restricted because of the
limitations of SAIL.  A new system using this organization would be
written in a language that has built in backtracking capability.

The disadvantage of the combined FORMATTER/PAGINATOR is that galleys


                                  1
BLOCK DIAGRAM                                          April 18, 1973


do not come out so soon for proofreading.  The combined version would
probably run 1.5 to 3 times slower than the FORMATTER alone would,
depending on the amount of backtracking required (estimates based on
experience with programming language compilers).

Most of his document will assume separate FORMATTER and PAGINATOR
programs.  However, the implications for a combined system should be
obvious.









































                                  2
April 18, 1973


                             SECTION II
                             _______ __

                           THE MANUSCRIPT
                           ___ __________




The manuscript contains sufficient information for the system to
compute the document without human intervention.  Thus, the system is
basically non-interactive.  However, this does not preclude provision
for optional interaction at appropriate points for debugging and
advising purposes.

The manuscript is actually a computer program in the yet unnamed
language P.  P is similar to PUB except that PUB is an augmented
subset of SAIL while P is an extension of MLISP.  The complete
facilities of MLISP are available to the author, including variables,
arrays, for-statements, recursion, list structures, function
declarations, and interaction.

Among the extensions to MLISP in P are "text expressions", "math
expressions", "calligraphic expressions", "image expressions",
"portion declarations", "area declarations", and "group
declarations".



TEXT EXPRESSIONS
____ ___________


II-A.  

Text expressions are equivalent to "paragraphs" in PUB.  A text
expression has a syntax such as "a blank line followed by an indented
line followed by several unindented lines".  The compiler translates
it to another format according to similar syntax specifications.
Examples of text expression formats might be "prose", "quotation",
"table", "heading", and "Algolprogram".

A text expression is composed of "words" and each word is composed of
"virtual glyphs" (formerly called "characters").  An example of a
virtual glyph (or "virgle") is "Small Seriph Italic Upright Black
Alpha".  A "Glyph Map" fed to the system along with the manuscript
maps virgles into "actual glyphs" or "augles".  For example, the
glyph map may say that "Small" is "8 point", "Seriph" is "Elzevir",
and "Alpha" is "Greek 101".  Or it may map all sizes into one, all
fonts into LPTFONT, and all glyph-sets into ASCII characters.


                                  3
THE MANUSCRIPT                                         April 18, 1973


The glyph map is conceptually an n-dimensional sparse array of
functions.  For example, "Large Seriph Italic A" may be specified as
appearing explicitly in a certain glyph file or may be specified as a
scale-reduction applied to an oversize glyph.



GLYPHS
______


II-B.  

Among the n coordinates that define a glyph are:

(1) Code.  A small integer selecting a particular character out of a
character set.

(2) Set.  A set of up to 91 characters, e.g., Greek Alphabet, Math
symbols 1, Accents.

(3) Case.  Upper, Lower.  Differs only for letters in alphabets.

(4) Style.  Light, Bold, Italic, Bold Italic, Demibold, etc.

(5) Font.  Caslon, Elzevir, Times Roman, Lptfont, Datadiscfont,
JohnDoefont.

(6) Size.  Measured in Points.  The P language has point-pica-inch
conversion primitives.

(7) Orientation.  Upright or some other angle between 0 and 360
degrees.

(8) Thickness.

(9) Texture.

(10) Color.



THE DEVICE SPECIFICATION
___ ______ _____________


II-C.  

A "Device Specification" file must be fed to the system along with


                                  4
April 18, 1973                                         THE MANUSCRIPT


the manuscript and the Glyph Map.  Conceptually, the Device
Specification defines a printing or viewing device as a set of
attributes such as RASTERSCAN, 200PPI, 2FONTS, NOGRAYSCALE.
Actually, the file is a collection of MLISP DEFPROPs and procedures
through which the FORMATTER, PAGINATOR, and POLISHER programs filter
the manuscript to obtain a document that can be processed by the
PRINTER/VIEWER program for the specified device.

Keeping such procedures on a separate file (usually in LAP form for
efficiency) keeps the kernel system small even when new devices are
added to its capability.

The PRINTER/VIEWER program and the Device Specification File are
provided by each installation for each of its devices.  It may be
possible in some cases for an installation to use a single P/V and
Device Spec for several devices.  In such a case, a single document
file could be printable on all of them.



THE FORMATTER
___ _________


II-D.  

The FORMATTER program is similar to the PARSER and FILLER modules of
PUB.  The PARSER is replaced by the MLISP compiler and the LISP
system.  The FILLER is replaced by modules for text, math, line-
drawings, and images.  The pagination capabilities of PUB are
intentionally omitted to simplify the FORMATTER and to allow more
complex capabilities to be handled by the PAGINATOR program.

During operation of the FORMATTER, the author can monitor its
progress on a terminal, interrupt it at landmark points, and interact
with it at breakpoints and error points.

The FORMATTER may generate tables of contents, indices, etc.  in
manuscript format as in PUB.  If it does, it swaps in an ALPHABETIZER
program to sort the indices. Then the FORMATTER is swapped back in to
process the generated portions.

A hyphenation capability is included in the text module for those who
like it.

The manuscript is structured into one or more portions, each of which
may be divided into fragments.  Non-global declarations are local to
portions and to fragments (unlike PUB).  Thus, it is possible to


                                  5
THE MANUSCRIPT                                         April 18, 1973


format fragments independently.  The system will remenber the states
of the few counters and other variables at the end of each fragment
that affect the processing of the next fragment.














































                                  6
April 18, 1973


                             SECTION III
                             _______ ___

                               GALLEYS
                               _______




The FORMATTER outputs two files called the "galley" and the "galley
guide" (analogous to the PUInS.PUI and the PUIn.PUI files of PUB).

The galley contains text, drawing directives, and image directives,
with sufficient information so that the Printer/Viewer program can
display it provisionally justified but not paginated.  There is a
single column for each section.  Footnotes and diagrams appear close
after the text which references them. Cross-references are not
resolved.

The galley guide is an abstract of the galley in which content is
omitted, size information is elaborated, and pagination directives
are carried forward.  The galley guide contains sufficient
information for the PAGINATOR program to lay out the document into
pages, areas, boxes, and columns.



THE PAGINATOR
___ _________


III-A.  

The PAGINATOR Program does not input the galley but only the galley
guide.  It essentially juggles rectangles and possibly other shapes
to fit them into pages, areas, and columns, keeping groups together,
placing footnotes below their referents, and keeping figures near the
texts that describe them.

The PAGINATOR needs to know device specifications but nothing about
glyphs.  It also needs to know the author's pagination directives
from the manuscript.  These can all be found in the galley guide.

The principal output of the PAGINATOR is the "Page Guide".  This is
probably in the same format as the Galley Guide, but its content is
sorted, structured, and pruned.

Whenever the PAGINATOR completes a page, it writes all cross-
reference labels that appeared on that page onto a file called the
"Cross-Reference Table" (CRT? -- no, XRT!).


                                  7
GALLEYS                                                April 18, 1973


THE POLISHER AND THE DOCUMENT
___ ________ ___ ___ ________


III-B.  

Some Printer/Viewer programs may have the sophistication to be able
to input the galley, the page guide, and the XRT and display a
finished document (see dotted line in Figure 1).  However, the normal
procedure is to feed them to the POLISHER program which produces a
well-ordered "document" file in which pages are together and cross-
references are resolved.  This file is easily handled by the P/V.



THE PRINTER/VIEWER
___ ______________


III-C.  

This device-dependent program can print either the galley or the
polished document, becuase both files are in the same format.

For raster devices, the P/V may have two passes.  One generates bit
matrices from vector/text representations, while the other actually
prints the matrices.

The P/V program may be parametric at the option of the installation.
In certain cases, it may be possible to substitute certain fonts for
others, to change the resolution specification, or to select certain
pages for output.

The P/V is the only program that looks at the actual images of
glyphs.  These glyphs are in a form appropriate to the device, e.g.,
octal code, bit matrix, vector outline.  The actual image is normally
computed from a contour representation extracted from the Registry.














                                  8
April 18, 1973


                             SECTION IV
                             _______ __

                            THE REGISTRY
                            ___ ________




There is a Network Registry of Glyphs as well as local registries.  A
document referring to a local registry can not be transmitted over
the Network.  Use of local registries should be limited to storing
new glyphs that have not had an opportunity to be registered in the
Network Registry.

The Registry consists of a Glossary and a Directory.

The Glossary lists the available Sets, Cases, Styles, Fonts, and so
forth.  There is a procedure for adding new entries to the Glossary,
e.g., the Russian alphabet to the Set Glossary or Clarendon to the
Font Glossary.  It is also possible to add new characters to existing
incomplete sets.

The Directory lists every Glyph File registered by a participating
installation, including its coordinates in the sparse array, complete
file name, and site name.  The coordinates must be use the
terminology of the Glossary.

It is not permissible to change a glyph file once it has been
registered in the Directory.

A font book will be published periodically to help people find what
they need in the Registry.



GLYPH FILES
_____ _____


IV-A.  

Each Network Glyph File defines a set of glyphs.  The file header
contains geometric information needed by the FORMATTER and POLISHER
programs, such as height, width, kerning profiles, and transformation
clues for changing scale, orientation, and thickness.  The remainder
of the file contains a curved contour representation of each glyph.

Each local installation is expected to have its own GLYPH CONVERTER
to generate local glyph files (see Figure 2).  The headers are simply


                                  9
THE REGISTRY                                           April 18, 1973


copied from Network Glyph Files, possibly changing scale,
orientation, and thickness.  The contours are converted to bit
matrices or vector outlines as appropriate.

In the case of trivial devices such as line printers, trivial glyph
files should be produced by the installation.  However, it is
important to stay within the framework of the registry.  For example,
if the LPT has an integral sign, it should be specified in the glyph
map as, say, "math-set 63" rather than as "latin-set 14".  The local
math-set glyph file would then specify that glyph 63 is really octal
14 on the LPT.  Other glyphs in the local math-set file would have no
good representation on the LPT.





































                                 10
April 18, 1973


                              SECTION V
                              _______ _

                         THE MLISP EXTENSION
                         ___ _____ _________




If the separate FORMATTER/PAGINATOR organization is followed, several
simple changes to MLISP will be made:

(1) Contraction.  Some features that would be useless to the system
and to most authors will be removed in the interest of saving space.
Authors needing these features could LAP them in.

(2) Macros.  The MLISP "DEFINE" only replaces one token by another.
Macros in P must be able to replace either an identifier or a
sequence of delimiters by an arbitrary sequence of tokens.  Invisible
tokens such as spaces, tabs, and line boundaries must be recognized
as tokens in text expressions of P.

(3) Strings.  The LISP string facilities are different in every
system and inadequate in all.  P will have its own string package
with a few primitives to be encoded in LAP for each object machine.
A string will be a series of glyphs; thus, the package would compute
widths and heights of text units such as words at high speed.

If a combined FORMATTER/PAGINATOR organization is followed, the
system will be written in LISP70 to take advantage of backtracking,
syntax-directed computation, coroutines, and edit strings.  Macros
will have to be added to the scanner.



ADVANTAGES AND DISADVANTAGES OF MLISP
__________ ___ _____________ __ _____


V-A.  

Among the advantages of an MLISP implementation of the new system
are:

(1) Efficiency.  The language will be processed by an extension of
the existing MLISP compiler, which translates at 3000 lines per
minute, more than three times faster than PUB Pass One.  Most PUB
macros could be procedures (EXPRs and FEXPRs) in P, so their
execution will be several times faster than in PUB (PUB spends much
of its time expanding macros).


                                 11
THE MLISP EXTENSION                                    April 18, 1973


(2) Flexibility.  Author procedures could directly call or redefine
procedures in the system.  During debugging, the author could set
breakpoints and perform traces.

(3) Portability.  The extended MLISP compiler will be written mostly
in STANDARD LISP, so that it will be transportable to new
installations with a minimum of effort.

The system should run equally well (except for speed differences) in
LISP1.6, TENEX-LISP, ILSP, MACLISP, and LISP70.  With a small amount
of LAP programming, it should run in LISPs on other computers than
the PDP-10 as well.

Disadvantages of MLISP are:

(1) Size.  The LISP1.6 version of the FORMATTER will probably be
nearly as large as PUB Pass One, becase of LISP and MLISP overhead.
This will be remedied when LISP70 is operational.

(2) Inefficiency.  The PAGINATOR and POLISHER may be simple enough to
be programmed in machine language at a substantial gain in
efficiency.  This may be done after portable LISP versions are
operational.

A LISP70 implementation will be of comparable efficiency.
Backtracking will tend to slow it down while data type declarations
will tend to speed it up.  It will be portable because LISP70 is
portable, and flexibility will be improved because of the extensible
nature of the language.




















                                 12
April 18, 1973


                             SECTION VI
                             _______ __

                              STANDARDS
                              _________




The following file formats shall be standardized:
(1) Individual Documents

        a. Manuscript.
        b. Galley and Document (same format).
        c. Cross-Reference Table.
        d. Galley Guide.
        e. Page Guide (similar to d?).

(2) Registry

        a. Glossary
        b. Directory
        c. Glyph File Header
        d. Curved Contour Representation

The following programs shall be written in portable fashion:

(1) FORMATTER

(2) PAGINATOR

(3) POLISHER



















                                 13
                                                       April 18, 1973


                             SECTION VII
                             _______ ___

                             REALIZATION
                             ___________




Manuscript and Registry standards shall be proposed by Palo Alto and
Galley and Document standards by Pittsburgh.

The FORMATTER shall be programmed by Rich Johnson and Brian Harvey
with assistance by Larry Tesler.

The PAGINATOR and POLISHER shall be programmed at CMU.

MLISP extensions shall be made at Stanford.  The ILSP implementation
will be maintained by CMU, the LISP1.6 (and later LISP70)
implementations by Stanford, and the TENEX-LISP implementation by
Xerox.

If LISP70 is used, Stanford will maintain the system.

Each installation shall provide its own glyph converters, text
editors, device specifications, and printer/viewers.  However, the
possibility of collaborating on XGP service should be explored as the
project proceeds.  CMU shall be the motivating force and shall do
most of the programming.

A target date of August 15 is suggested for a first version of the
system.  Although only a subset will be implemented in the first
version, the framework for supplying the remainder must be provided.

This optimistic estimate is based on the fact that PUB was completed
in six months by one person in an inappropriate language.  The new
implementation is simplified by separating pagination from filling
and by building on an existing compiler.  Although the new system has
many sophisticated facilities, they have all been done before in some
form by some of the implementors.











                                 14
April 18, 1973


                            SECTION VIII
                            _______ ____

                   PROPOSAL FOR GRAPHICS LANGUAGE
                   ________ ___ ________ ________




by Robert Sproull

This is an editor's abstract of a typewritten document.

A document is composed of "boxes" with geometry, marked where page
breaks can occur.  Each box has a "body" and "i.d. info".  The body
has printing rules.  The i.d. has names for subtitling and
positioning relative to other boxes,.  Processing within each box is
independent, allowing for incremental compilation of a document.

LISP procedures are more useful than macros, e.g., to specify line
drawings in the graphics section.

Line-drawing primitives are suggested: absolute/relative point/line,
line or curve with thickness and texture, string (caption), device-
dependent code.

Floating-point coordinate system chosen by user.

Curves in terms of endpoints and control points.  Latter not
necessarily on the curve, but guide fitter.

Program must be able to interrogate the state, including questions
like "How many inches would a vector of length dx,dy occupy?".  Other
questions: resolution, string dimensions, aspect ratio.

A display procedure (cf. Newman, CACM) has arguments, prog variables,
and also a "master rectangle" within which it can draw.  A display
procedure call may optionally specify the instance rectangle, as well
as location, rotation, scale, and transform matrix.  The system
automatically applies these transformations from the user's
coordinate system to the page.

Display procedure calls draw within a "box" of given size as
mentioned earlier.







                                 15
                                                       April 18, 1973


                             SECTION IX
                             _______ __

                              FIGURE 1
                              ______ _



BLOCK DIAGRAM -- PART 1 OF 2

               +++++++++++
              |  SCRIBBLE |
               +++++++++++
                    |
                    ∨
               ***********
              |TEXT EDITOR|
               ***********
                    |
                    ∨
               -----------
              |           |
              | MANUSCRIPT|
              |           |
               -----------
                    |
                    |<-------------------------
 +++++++++          ∨                          |
|         |    ***********      ************   |
| MONITOR |<--| FORMATTER |--->|ALPHABETIZER|--
|         |    ***********      ************
 +++++++++     |        |
               ∨        ∨
    ------------    ----------
   |            |  |          |
   |GALLEY GUIDE|  |  GALLEY  |
   |            |  |          |
    ------------    ---------- 













                                 16
April 18, 1973                                               FIGURE 1


BLOCK DIAGRAM -- PART 2 OF 2

    ------------    ----------
   |            |  |          |
   |GALLEY GUIDE|  |  GALLEY  |-----
   |            |  |          |     |
    ------------    ----------      |
             |                      |
             ∨                      |
            ***********             |
           | PAGINATOR |            |
            ***********             |
             |       |              |
             ∨       ∨              |
     -----------   -----------      |
    |           | |   CROSS   |     |
    |   PAGE    | | REFERENCE |     |
    |   GUIDE   | |   TABLE   |     |
     -----------   -----------      |
             |           |        -----
             |           |       |     |
             ∨           ∨       ∨     |
             ---------------------     |
                 |             .       |
                 ∨             .       |
            ***********        .       |
           |  POLISHER |       .       |
            ***********        .       |
                 |             .       |
                 ∨             .       |
            -----------        .       |                  +++++++++
           |           |       ∨       ∨   *********     |HARD COPY|
           |  DOCUMENT |------------------| PRINTER |--->|   OR    |
           |           |                  | /VIEWER |    | DISPLAY |
            -----------                    *********      +++++++++














                                 17
                                                       April 18, 1973


                              SECTION X
                              _______ _

                              FIGURE 2
                              ______ _



GLYPH CONVERTER

         ----------
        |          |
        | REGISTRY |
        |          |
         ----------
             |
             ∨
         ***********
        | CONVERTER |
         ***********
          |       |
          ∨       ∨
 ------------    ---------
|  GLYPH     |  |  GLYPH  |
|DESCRIPTIONS|  | IMAGES  |
 ------------    ---------

























                                 18